lib: os: mpsc_pbuf: fix potential semaphore wait forever #97931

fei315412-cmyk · 2025-10-20T13:51:50Z

One thread calls mpsc_pbuf_alloc to produce data, which invokes add_skip_item and steps into k_sem_take.

Another thread calls mpsc_pbuf_claim to consume data. In this condition, mpsc_pbuf_claim has only small remaining space and needs to call rd_idx_inc to reserve space, but there is still no data available.

The consumer should call k_sem_give to wake mpsc_pbuf_alloc again, so the producer can allocate space and continue producing data.

Without this wake-up, the producer thread may wait forever in k_sem_take, leading to a deadlock situation.

github-actions · 2025-10-20T13:52:36Z

Hello @fei315412-cmyk, and thank you very much for your first pull request to the Zephyr project!
Our Continuous Integration pipeline will execute a series of checks on your Pull Request commit messages and code, and you are expected to address any failures by updating the PR. Please take a look at our commit message guidelines to find out how to format your commit messages, and at our contribution workflow to understand how to update your Pull Request. If you haven't already, please make sure to review the project's Contributor Expectations and update (by amending and force-pushing the commits) your pull request if necessary.
If you are stuck or need help please join us on Discord and ask your question there. Additionally, you can escalate the review when applicable. 😊

fei315412-cmyk · 2025-10-27T01:57:55Z

@nordic-krch @dcpleung this bug was found in production. please provide any suggestion , thanks .

dcpleung · 2025-10-27T02:35:49Z

I am not familiar with mpsc_pbuf.

nordic-krch · 2025-10-27T05:50:08Z

@fei315412-cmyk but after calling mpsc_pbuf_claim (which does not call k_sem_give) user shall eventually call mpsc_pbuf_free on that buffer and it calls k_sem_give so i don't see how that deadlock could occur. Can you add a test case that triggers that behavior?

fei315412-cmyk · 2025-10-27T11:25:55Z

@fei315412-cmyk but after calling mpsc_pbuf_claim (which does not call k_sem_give) user shall eventually call mpsc_pbuf_free on that buffer and it calls k_sem_give so i don't see how that deadlock could occur. Can you add a test case that triggers that behavior?

can you tell me ?　how can compile test case about mpsc.c and how to run ? give me example and i write test case and test @nordic-krch

nordic-krch · 2025-10-27T14:26:20Z

for example

west build -p -b qemu_x86 tests/lib/mpsc_pbuf/ -T libraries.mpsc_pbuf.concurrent
west build -t run

fei315412-cmyk · 2025-10-29T08:10:09Z

@nordic-krch
PR submitted. Please review when you have time.
waiting for you new message.
if not this commit, mpsc_pbuf_alloc will block by k_sem_take , need this commit to wakeup

block backtraces as fallows:
Thread 28 (Thread 0xf5fffb40 (LWP 30265) "test_sema_lock"):
#0 0xf7eeadf9 in __kernel_vsyscall ()
#1 0xf7ca8c42 in __libc_do_syscall () at ../sysdeps/unix/sysv/linux/i386/libc-do-syscall.S:39
#2 0xf7c19ee3 in __futex_abstimed_wait_common32 (private=, cancel=, abstime=, op=, expected=, futex_word=) at ./nptl/futex-internal.c:40
#3 __futex_abstimed_wait_common (futex_word=0x8ac58e8, expected=1, clockid=, abstime=0x0, private=0, cancel=true) at ./nptl/futex-internal.c:99
#4 0xf7c1a0df in __GI___futex_abstimed_wait_cancelable64 (futex_word=, expected=, clockid=, abstime=0x0, private=0) at ./nptl/futex-internal .c:139
#5 0xf7c267e2 in do_futex_wait (sem=sem@entry=0x8ac58e8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:116
#6 0xf7c2688b in __new_sem_wait_slow64 (sem=0x8ac58e8, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:284
#7 0x08056be7 in nct_sem_rewait (semaphore=0x8ac58e8) at /workdir/zephyr/zephyr-git-commit/scripts/native_simulator//common/src/nct.c:149
#8 nct_wait_until_allowed (tt_el=0x8ac58e0, this_th_nbr=25) at /workdir/zephyr/zephyr-git-commit/scripts/native_simulator//common/src/nct.c:179
#9 0x08056cff in nct_swap_threads (this_arg=0x8ac5410, next_allowed_thread_nbr=26) at /workdir/zephyr/zephyr-git-commit/scripts/native_simulator//common/src/nct.c:241
#10 0x080505ce in posix_swap (next_allowed_thread_nbr=, this_th_nbr=) at /workdir/zephyr/zephyr-git-commit/arch/posix/core/posix_core_nsi.c:38
#11 0x080503ea in arch_swap (key=0) at /workdir/zephyr/zephyr-git-commit/arch/posix/core/swap.c:64
#12 0x080530fe in z_swap_irqlock (key=) at /workdir/zephyr/zephyr-git-commit/kernel/include/kswap.h:216
#13 0x080529e5 in z_impl_k_sem_take (sem=0x806194c <mpsc_buffer+44>, timeout=...) at /workdir/zephyr/zephyr-git-commit/kernel/sem.c:158
#14 0x0804ea77 in k_sem_take (timeout=..., sem=0x806194c <mpsc_buffer+44>) at /workdir/zephyr/zephyr-git-commit/build/zephyr/include/generated/zephyr/syscalls/kernel.h:1158
#15 mpsc_pbuf_alloc (buffer=0x8061920 <mpsc_buffer>, wlen=452, timeout=...) at /workdir/zephyr/zephyr-git-commit/lib/os/mpsc_pbuf.c:385
#16 0x0804afd7 in log_buffer_test_sema_lock () at /workdir/zephyr/zephyr-git-commit/tests/lib/mpsc_pbuf/src/main.c:1149
#17 0x08050b5b in run_test_functions (suite=0x8061380 <z_ztest_test_node_log_buffer>, data=0x0, test=0x806155c <z_ztest_unit_test.log_buffer.test_sema_lock>) at /workdir/zephyr/zephyr-gi t-commit/subsys/testsuite/ztest/src/ztest.c:328

fei315412-cmyk · 2025-10-29T08:40:09Z

test case result:

if not commit:

backtrace:

fei315412-cmyk · 2025-10-29T15:14:34Z

@nordic-krch @nashif @dcpleung please provide any suggestion again , thanks .
i have also commit test case trigger this bug and commit image for this backtrace.

One thread calls mpsc_pbuf_alloc to produce data, which invokes add_skip_item and steps into k_sem_take. Another thread calls mpsc_pbuf_claim to consume data. In this condition, mpsc_pbuf_claim has only small remaining space and needs to call rd_idx_inc to reserve space, but there is still no data available. The consumer should call k_sem_give to wake mpsc_pbuf_alloc again, so the producer can allocate space and continue producing data. Without this wake-up, the producer thread may wait forever in k_sem_take, leading to a deadlock situation. Signed-off-by: Fei Wang <[email protected]>

sonarqubecloud · 2025-10-31T02:09:38Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

zephyrbot added the area: Logging label Oct 20, 2025

zephyrbot requested review from dcpleung and nordic-krch October 20, 2025 13:53

zephyrbot assigned nordic-krch Oct 20, 2025

fei315412-cmyk force-pushed the main branch from ca92770 to f378449 Compare October 29, 2025 08:04

zephyrbot added the area: Tests Issues related to a particular existing or missing test label Oct 29, 2025

zephyrbot requested a review from nashif October 29, 2025 08:05

fei315412-cmyk force-pushed the main branch 3 times, most recently from ba8e29e to 9e95451 Compare October 29, 2025 12:53

fei315412-cmyk force-pushed the main branch from 9e95451 to 4d2e119 Compare October 31, 2025 01:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

lib: os: mpsc_pbuf: fix potential semaphore wait forever #97931

lib: os: mpsc_pbuf: fix potential semaphore wait forever #97931

Uh oh!

fei315412-cmyk commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

fei315412-cmyk commented Oct 27, 2025

Uh oh!

dcpleung commented Oct 27, 2025

Uh oh!

nordic-krch commented Oct 27, 2025

Uh oh!

fei315412-cmyk commented Oct 27, 2025

Uh oh!

nordic-krch commented Oct 27, 2025

Uh oh!

fei315412-cmyk commented Oct 29, 2025 •

edited

Loading

Uh oh!

fei315412-cmyk commented Oct 29, 2025

Uh oh!

fei315412-cmyk commented Oct 29, 2025

Uh oh!

sonarqubecloud bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lib: os: mpsc_pbuf: fix potential semaphore wait forever #97931

Are you sure you want to change the base?

lib: os: mpsc_pbuf: fix potential semaphore wait forever #97931

Uh oh!

Conversation

fei315412-cmyk commented Oct 20, 2025

Uh oh!

github-actions bot commented Oct 20, 2025

Uh oh!

fei315412-cmyk commented Oct 27, 2025

Uh oh!

dcpleung commented Oct 27, 2025

Uh oh!

nordic-krch commented Oct 27, 2025

Uh oh!

fei315412-cmyk commented Oct 27, 2025

Uh oh!

nordic-krch commented Oct 27, 2025

Uh oh!

fei315412-cmyk commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fei315412-cmyk commented Oct 29, 2025

Uh oh!

fei315412-cmyk commented Oct 29, 2025

Uh oh!

sonarqubecloud bot commented Oct 31, 2025

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fei315412-cmyk commented Oct 29, 2025 •

edited

Loading